
Financial timeseries prediction is a particularly hard task, and because of that, Machine Learning tools are up to the task, particularly in the proposal of feature engineering process, predictive modeling and hyperparameter optimization. In this work we have shown a process to build 3 types of models for classification: OLS Regression with Elastic Net Regularization, L1 Support Vector Machines and an Aritificial Neural Network (Multilayer perceptron), but also, a genetic programming approach to generate, from linear features, a non linear feature engineering process. Also we have found that using the sign output of the OLS Regression produces very poor results for classification, but had very good results for l1-SVM and ANN-MLP. Finally we include results above 90% out of sample classification accuracy.
Timeseries data prediction using machine learning techniques is a complex problem, one that can be defined with 3 challenges:
The feature engineering and selection processes, when one is working with timeseries data, particular challenges arise when endogenous explanatory variables (features) are constructed, the main one would be multicollinearity among candidate features. In the model definition process, the bias-variance trade-off is always a present dilema, also with the desire of having a low model complexity and high explainability. Finally, crossvalidation techniques are quite different for timeseries data in contrast with panel data, mainly because the importance of the order in the data and its "memory" properties like the commonly autoregressive behavior.
In the context of machine learning models, either for regression or classification of a target variable, an important aspect is of interest: The definition of the model as a convex or non convex formulation of the regression/classification problem. If the model is based in a convex formulation a unique solution is guaranteed to be found in the optimization process, where the non convex doesnt have that guarantee.
In Machine Learning applications for timeseries prediction there can be serious consequences of not having a robust modelling process, from poor performance to overtitting. Some reasons of could be not conducting a valid feature engineering process in order to have enough information for the model to predict, with a poor model definition, or worse, a very complex and very unexplainable model, combined with not considering the "memory" in the data in the crossvalidation process.
To motivate this work, with the particular focus on the convex optimization theory, we choose the three identified problems in the previous section: Feature engineering and feature selection, Model definition and hyperparameter optimization and Crossvalidation, to propose the following research question, three hypotheses and a general experiment to test them.
From a Convex Optimization perspective, Which would be the processes to be performed in order to fit predictive linear models for financial timeseries data, using linear and non-linear endogenous features ?
The reason why this work is relevant in a practical perspective, because it is interesting to test whether a non linear transformation of linear variables will be sufficient in order to have linear model and performe sufficiently well. Both, from the perspective of an OLS + ElasticNetc regularization problem for sign prediction, and SVM for classification.
Build a predictive model for financial timeseries data, using only endogenous variables, and using a classification approach.
# %%capture is to hide the results of the execution
#%%capture
# !pip install -r requirements.txt
# -- Import libraries for this notebook
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import warnings
warnings.filterwarnings("ignore")
# -- to use in google colab only
# from IPython.display import Math, HTML
# -- Import the projects scripts
import functions as fn # feature engineering and processes
import data as dt # input and out data processes
import visualizations as vs # all the plots
# -- to visualize offline plots inside jupyte notebook
from plotly.offline import iplot
The data used in this project is the Future Contract price for the Usd/Mxn exchange rate. Obtained from the Chicago Mercantile Exchange Group, for the past almost 4 years, from 2017-01-02 00:00:00 to the 2020-10-30 00:00:00. We have the OHLC (Open, High, Low, Close Volume) data.
# general data
data_ohlc = dt.ohlc_data.copy()
# print the first 5 rows
data_ohlc.head(5)
# print the last 5 rows
data_ohlc.tail(5)
# all data description
table_1 = data_ohlc.describe()
#
table_1
The target variable, $y_{t}$ is binary discrete representation of the price change in one day, with the form $y_{t} \in \left\{ -1, 1\right\}$, and is calculated with $sign[close - open]$
# produce an example dataframe
experiment_data = data_ohlc.copy()
# target variable generation
experiment_data['co'] = experiment_data['close'] - experiment_data['open']
experiment_data['co_d'] = [1 if i > 0 else -1 for i in list(experiment_data['co'])]
# shift target variable to avoid direct leakage of target variable info in features info
experiment_data['co_d'] = experiment_data['co_d'].shift(-1, fill_value=9999)
# print the first 5 rows of dataframe
experiment_data.head(5)
# dates for every fold in order to construct the 2nd plot
dates_folds = [data_ohlc.iloc[947, 0]]
# print messages in console
print('\nThe complete time period in the data is: ', len(data_ohlc), 'market* days')
print('\nThe training period is \nfrom: ', data_ohlc.iloc[0, 0], 'to:', data_ohlc.iloc[947, 0], ',',
948, 'market* days in total')
print('\nThe testing period is \nfrom: ', data_ohlc.iloc[948, 0], 'to:', data_ohlc.iloc[-1, 0], ',',
238, 'market* days in total')
# OHLC Prices with train & test vertical line division
plot_2 = vs.g_ohlc(p_ohlc=data_ohlc,
p_theme=dt.theme_plot_2,
p_vlines=dates_folds)
# show plot in explorer
iplot(plot_2)
Besides having all the generated features, we conducted a standarization process for every variable. That is, $z_{k} = \frac{x_{k} - \mu}{\sigma} $
The main objective for this project is to have a classificator that correctly predicts whether the close price in the $t+1$ will be higher or lower than the close price observed in $t$.
model_fit = model.fit_transform(p_x, p_y)
# output data of the model
data = pd.DataFrame(np.round(model_fit, 6))
# parameters of the model
model_params = model.get_params()
# best programs dataframe
best_programs = {}
for p in model._best_programs:
factor_name = 'sym_' + str(model._best_programs.index(p))
best_programs[factor_name] = {'raw_fitness': p.raw_fitness_, 'reg_fitness': p.fitness_,
'expression': str(p), 'depth': p.depth_, 'length': p.length_}
# formatting, drop duplicates and sort by reg_fitness
best_programs = pd.DataFrame(best_programs).T
best_programs = best_programs.drop_duplicates(subset = ['expression'])
best_programs = best_programs.sort_values(by='reg_fitness', ascending=False)
# results
results = {'fit': model_fit, 'params': model_params, 'model': model, 'data': data,
'best_programs': best_programs, 'details': model.run_details_}
model_fit = model.fit_transform(p_x, p_y)
# output data of the model
data = pd.DataFrame(np.round(model_fit, 6))
# parameters of the model
model_params = model.get_params()
# best programs dataframe
best_programs = {}
for p in model._best_programs:
factor_name = 'sym_' + str(model._best_programs.index(p))
best_programs[factor_name] = {'raw_fitness': p.raw_fitness_, 'reg_fitness': p.fitness_,
'expression': str(p), 'depth': p.depth_, 'length': p.length_}
# formatting, drop duplicates and sort by reg_fitness
best_programs = pd.DataFrame(best_programs).T
best_programs = best_programs.drop_duplicates(subset = ['expression'])
best_programs = best_programs.sort_values(by='reg_fitness', ascending=False)
# results
results = {'fit': model_fit, 'params': model_params, 'model': model, 'data': data,
'best_programs': best_programs, 'details': model.run_details_}
In time series analysis, the lag operator $L$ operates on an element of a time series to produce the previous element. For example, given a time series variable $X_t = \left\{ X_1, X_2, ... X_N \right\}$, then $LX_t = L X_{t-1} \forall t \geq 1$.
And the moving average is simply defined by the average of a particular column that has a window of an arbitrary information, in the case of this project, the window size (memory) of the time series is 7.
# For the autoregressive feature engineering process
p_memory = 7
# Data with autoregressive variables
data_ar = fn.f_autoregressive_features(p_data=data_ohlc, p_nmax=p_memory)
# Dependent variable (Y) separation
data_y = data_ar['co_d'].copy()
# Timestamp separation
data_timestamp = data_ar['timestamp'].copy()
# Independent variables (x1, x2, ..., xn)
data_ar = data_ar.drop(['timestamp', 'co', 'co_d'], axis=1, inplace=False)
# print dataframe
data_ar
In mathematics, the Hadamard product (also known as the element-wise product) is a binary operation that takes two matrices of the same dimensions and produces another matrix of the same dimension as the operands, where each element $i$, $j$ is the product of elements $i$, $j$ of the original two matrices.
For two matrices $A$ and $B$ of the same dimension $m \times n$, the Hadamard product $(A \circ B)$ is a matrix of the same dimensions as the operands, with the elements given by:
\begin{equation} (A \circ B)_{ij} = (A \odot B)_{ij} = (A)_{ij}(B)_{ij} \end{equation}For matrices of different dimensions $(A_{ij} \times B_{ij})$ the Hadamard product is undefined.
So, since in this particular case, the feature we have generated previously (Autoregressive features) are always of the same dimensions, because $rows=days$ and $columns=1$, if we take two columns of the data set, that is, all of the data points of two features, the Hadamard product will be defined and the calculation of the element-wise multiplication of each row of those two features columns.
# Data with Hadamard product variables
data_had = fn.f_hadamard_features(p_data=data_ar, p_nmax=p_memory)
# print result
data_had
The following operations were performed with the previously generated features, some of the operations has afinity of 1, like the Inverse.
The used symbolic operations were:
# -- -------------------------------------------------------- Symbolic Features Generator (Run it once) --- #
# run variable generating function
fun_sym = fn.symbolic_features(p_x=data_had, p_y=data_y)
print('The dimensions of the resulting DataFrame with the BEST programms: ' + str(fun_sym['best_programs'].shape), '\n')
display(fun_sym['best_programs'])
# info about symbolic variables
best_p = list(fun_sym['best_programs'].index)
data_sym = fun_sym['data'][best_p]
display(data_sym.head(5), data_sym.tail(5))
# symbolic expressions (equations) for the generated variables
print('\nA very simple programm: ' + str(fun_sym['best_programs'].index[0]), '\n\n', 'co_d =', fun_sym['best_programs']['expression'][0])
print('\nA not so simple programm: ' + str(fun_sym['best_programs'].index[1]), '\n\n', 'co_d =', fun_sym['best_programs']['expression'][1])
print('\nA very deep programm: ' + str(fun_sym['best_programs'].index[25]), '\n\n', 'co_d =', fun_sym['best_programs']['expression'][25], '\n')
# save founded symbolic features
# dt.data_save_load(p_data_objects={'features': data_sym, 'equations': eq_sym},
# p_data_action='save', p_data_file='files/features/oc_symbolic_features_11_65.dat')
# -- ------------------------------------------------------------------------------------ Load variables -- #
# -- --------------------------------------------------------------------------------------------------- -- #
# Load previously generated variables (for reproducibility purposes)
data_sym = dt.data_save_load(p_data_objects=None, p_data_action='load',
p_data_file='files/features/oc_symbolic_features_11_65.dat')
# datos para utilizar en la siguiente etapa
data = pd.concat([data_ar.copy(), data_had.copy(), data_sym['features'].copy()], axis=1)
# print concatenated data
data
# model data
model_data = dict()
# Whole data separation for train and test
xtrain, xtest, ytrain, ytest = train_test_split(data, data_y, test_size=.2, shuffle=False)
# Data vision inside the dictionary
model_data['train_x'] = xtrain
model_data['train_y'] = ytrain
model_data['test_x'] = xtest
model_data['test_y'] = ytest
print('The training dataset has a length of:', len(xtrain),
'data points')
print('\nThe test dataset has a length of:', len(xtest),
'data points')
Use the OLS regression with ElasticNet Regularization to produce a numerical value, but only the sign will be taken as the predicted output
Use the L1 case of support vector machines, testing with differente kernels.
where:
\begin{equation} |\beta^2| = \sum_{j=1}^{p}\beta_{j}^{2} = \text{Ridge (L2)} \quad , \quad |\beta|_1 = \sum_{j=1}^{p}\beta_{j} = \text{Lasso (L1)} \end{equation}and the function $(1 - \alpha)|\beta_{1}| + \alpha|\beta|^2$ is called the elastic net penalty which is a convex combination of the lasso and the ridge penalty.
If there is a group of several highly correlated variables, the LASSO tends to select just one variable from the group and drop the others, this can be at times very useful to overcome correlation among explanatory variables, but can be unproductive since completly droping the variables can have a negative impact in the predictive power of the features.
And if there is a group of variables with a large coefficient, maybe due to overfitting, the RIDGE tends to reduce all of them and produce lower values for the regressors.
$\alpha$ will be the ratio of the elastic net penalty applied to the model of Ordinary Least Squares.
en_parameters = {'alpha': 11.9, 'ratio': .08}
elastic_net = fn.ols_elastic_net(p_data=model_data, p_params=en_parameters)
# Model accuracy (in of sample)
in_en_acc = round(elastic_net['metrics']['train']['acc']*100, 2)
print('The model accuracy with train data was: ', in_en_acc, '%')
# Model accuracy (out of sample)
out_en_acc = round(elastic_net['metrics']['test']['acc']*100, 2)
print('\nThe model accuracy with test data was: ', out_en_acc,'%')
# get train and test y data
train_y = elastic_net['results']['data']['train']
test_y = elastic_net['results']['data']['test']
# build dict for the plot
ohlc_class = {'train_y': train_y['y_train'],
'train_y_pred': train_y['y_train_pred'],
'test_y': test_y['y_test'],
'test_y_pred': test_y['y_test_pred']}
# make plot
plot_en = vs.g_ohlc_class(p_ohlc=data_ohlc,
p_theme=dt.theme_plot_3,
p_data_class=ohlc_class,
p_vlines=dates_folds)
# visualize plot
plot_en.show()
We can notice very poor results, both in the training period (left to the vertical line) and the testing period (right to the vertical line)
A supervised learning method used for classification, regression and outliers detection.
The advantages of support vector machines are:
The disadvantages of support vector machines include:
A support vector machine constructs a hyper-plane, or set of hyper-planes, in a high dimensional space called Hilbert space. A good separation is achieved by the hyper-plane that has the largest distance to the nearest training data points of any class (so-called functional margin) since, in general, the larger the margin the lower the generalization error of the classifier.
The next figure below shows the decision function for a linearly separable problem, with three samples on the margin boundaries, called support vectors
But, when the problem isn’t linearly separable, as most of the cases in the real applications, SVM address non-linearly separable cases by introducing two concepts: Soft Margin and Kernel Tricks.
The support vectors are the samples within the margin boundaries. And there are tolerated two types of errors:
For the kernel tricks, we will address three types of them:
Linear (linear) : $K(x_{k}, y_{l}) = \left\langle x_{k}, x_{l} \right\rangle$
Polynomial (poly) : $K(x_{k},y_{l}) = \left( x_{k}^Tx_{l} + c \right)^d$
Radial Based Function (rbf) : $K(x_{k}, x_{l}) = \exp \left( -\frac{|| x_{k} - x_{l} ||^2}{2 \sigma^2} \right)$
We will be using the non-separable case, and explore the 3 types of kernels to build a classificator with SVM
The L1 Support Vector Machines formulation for a classification problem is the following:
where: \ $y_k \in \left\{-1, 1\right\}$ : the target variables. \ $\xi_k \in \mathbb{R}^n$ : Slack variables. \ $\phi(x_k)$ : Feature Hyperspace mapping of the form $\varphi(\cdot) : \mathbb{R}^n \rightarrow \mathbb{R}^m$. \ $w \in R^m$ : Model weights (Hyperparameter). \ $c > 0$ : Regularization coefficient (Hyperparameter). \ $\gamma > 0$ : Model (Hyperparameter). \ $b \in R$ : Model (Hyperparameter). \
And the decision function for a given sample $x$ becomes:
\begin{equation} \sum_{i \in SV}^{N} y_{i} \alpha_{i} K(x_{i}, x) + b \end{equation}and the predicted class correspond to its sign. We only need to sum over the support vectors (i.e. the samples that lie within the margin) because the dual coefficients are zero for the other samples.
where: \ $y_k \in \left\{-1, 1\right\} \rightarrow \left\{\textit{price goes down}, \textit{price goes up} \right\}$ : \ $\xi_k \in \mathbb{R}^n$ : Used internally in the $\textit{ls_svm}$ \ $w \in R^m$ : Used internally in the $\textit{ls_svm}$ \ $b \in R$ : Used internally in the $\textit{ls_svm}$ \ $c > 0$ : Inverse regularization coefficient $\in [0,1]$ \ $\phi(x_k, x_{l})$ : Kernel $\in [linear, rbf]$ \ $\gamma > 0$ : Kernel coefficient for $\textit{rbf}$ $\in [0, 1]$
$\phi(x_k, x_{l})$, $\gamma$, $c$ are the model's Hyperparameters we have to choose
In this work, three types of Kernels where tested, linear, Radial Based Function (RBF) and Polynomial
Linear: $K(x_{k}, y_{l}) = \left\langle x_{k}, x_{l} \right\rangle$
l1_svm_linear_params = {'kernel': 'linear', 'gamma': 'auto', 'c': 1.5, 'degree': 0, 'coef0': 0}
# gamma='scale' -> 1/(n_features * X.var())
# gamma=‘auto’ -> 1/n_features
l1_svm_linear = fn.l1_svm(p_data=model_data, p_params=l1_svm_linear_params)
# get train and test y data
train_y = l1_svm_linear['results']['data']['train']
test_y = l1_svm_linear['results']['data']['test']
# build dict for the plot
ohlc_class = {'train_y': train_y['y_train'], 'train_y_pred': train_y['y_train_pred'],
'test_y': test_y['y_test'], 'test_y_pred': test_y['y_test_pred']}
# plot title
dt.theme_plot_4['p_labels']['title'] = 'L1-SVM (LINEAR) Model Results'
# make plot
plot_svm_linear = vs.g_ohlc_class(p_ohlc=data_ohlc, p_theme=dt.theme_plot_4,
p_data_class=ohlc_class, p_vlines=dates_folds)
# visualize plot
plot_svm_linear.show()
# Model accuracy (in sample)
in_svm_acc = round(l1_svm_linear['metrics']['train']['acc']*100, 2)
print('The model accuracy with train data was: ', in_svm_acc, '%')
# Model accuracy (out of sample)
out_svm_acc = round(l1_svm_linear['metrics']['test']['acc']*100, 2)
print('\nThe model accuracy with test data was: ', out_svm_acc,'%')
# get the support vectors information
support_vectors = l1_svm_linear['model'].n_support_
print('The number of support vector for class -1 are:', support_vectors[0])
print('\nThe number of support vector for class 1 are:', support_vectors[1])
Radial Basis Function (RBF) : \
$K(x_{k}, x_{l}) = \exp \left( -\frac{|| x_{k} - x_{l} ||^2}{2 \sigma^2} \right) = exp(-\gamma || x_{k}-x_{l} ||^2)$ \
where: \ $\gamma = \frac{1}{2 \sigma^2} > 0: \text{gamma}$
l1_svm_rbf_params = {'kernel': 'rbf', 'gamma': 'scale', 'c': 1.5, 'degree': 0, 'coef0': 0}
# gamma='scale' -> 1/(n_features * X.var())
# gamma=‘auto’ -> 1/n_features
l1_svm_rbf = fn.l1_svm(p_data=model_data, p_params=l1_svm_rbf_params)
When training an SVM with the Radial Basis Function (RBF) kernel, two parameters must be considered:
The parameter C, common to all SVM kernels, trades off misclassification of training examples against simplicity of the decision surface. A low C makes the decision surface smooth, while a high C aims at classifying all training examples correctly.
The parameter gamma defines how much influence a single training example has. The larger gamma is, the closer other examples must be to be affected.
# get train and test y data
train_y = l1_svm_rbf['results']['data']['train']
test_y = l1_svm_rbf['results']['data']['test']
# build dict for the plot
ohlc_class = {'train_y': train_y['y_train'], 'train_y_pred': train_y['y_train_pred'],
'test_y': test_y['y_test'], 'test_y_pred': test_y['y_test_pred']}
# plot title
dt.theme_plot_4['p_labels']['title'] = 'L1-SVM (RBF) Model Results'
# make plot
plot_svm = vs.g_ohlc_class(p_ohlc=data_ohlc, p_theme=dt.theme_plot_4,
p_data_class=ohlc_class, p_vlines=dates_folds)
# visualize plot
plot_svm.show()
Notice the good results in the training period (lef to the vertical line), and decent results in testing period (right to the vertical line), specially, the prediction errors in the uptrend of the prices.
# Model accuracy (in sample)
in_svm_acc = round(l1_svm_rbf['metrics']['train']['acc']*100, 2)
print('The model accuracy with train data was: ', in_svm_acc, '%')
# Model accuracy (out of sample)
out_svm_acc = round(l1_svm_rbf['metrics']['test']['acc']*100, 2)
print('\nThe model accuracy with test data was: ', out_svm_acc,'%')
# get the support vectors information
support_vectors = l1_svm_rbf['model'].n_support_
print('The number of support vector for class -1 are:', support_vectors[0])
print('\nThe number of support vector for class 1 are:', support_vectors[1])
Polynomial Function : \
$K(x_{k},y_{l}) = (x_{k}^Tx_{l} + c)^d = (\gamma \left\langle \varphi(x_{k}), \varphi(x_{l}) \right\rangle + r )^d$ \
where: \ $\gamma > 0: \text{gamma}$ \ $d : \text{degree}$ \ $r : \text{coef0}$
l1_svm_poly_params = {'kernel': 'poly', 'gamma': 'scale', 'c': 1.5, 'degree': 2, 'coef0': 0}
# gamma='scale' -> 1/(n_features * X.var())
# gamma=‘auto’ -> 1/n_features
l1_svm_poly = fn.l1_svm(p_data=model_data, p_params=l1_svm_poly_params)
One problem with the polynomial kernel is that it may suffer from numerical instability: when $x^T y + c < 1$, $K(x, y) = (x^T y + c)^d$ tends to zero with increasing $d$, whereas when $x^T y + c > 1$, $K(x, y)$ tends to infinity. The most common degree is $d = 2$ (quadratic), since larger degrees tend to overfit.
# get train and test y data
train_y = l1_svm_poly['results']['data']['train']
test_y = l1_svm_poly['results']['data']['test']
# build dict for the plot
ohlc_class = {'train_y': train_y['y_train'], 'train_y_pred': train_y['y_train_pred'],
'test_y': test_y['y_test'], 'test_y_pred': test_y['y_test_pred']}
# plot title
dt.theme_plot_4['p_labels']['title'] = 'L1-SVM (POLYNOMIAL) Model Results'
# make plot
plot_svm = vs.g_ohlc_class(p_ohlc=data_ohlc, p_theme=dt.theme_plot_4,
p_data_class=ohlc_class, p_vlines=dates_folds)
# visualize plot
plot_svm.show()
Notice the good results in the training period (lef to the vertical line), and also, in contrast with the rbf kernel, the good and stable results in testing period (right to the vertical line), specially, in the uptrend of the prices.
# Model accuracy (in sample)
in_svm_acc = round(l1_svm_poly['metrics']['train']['acc']*100, 2)
print('The model accuracy with train data was: ', in_svm_acc, '%')
# Model accuracy (out of sample)
out_svm_acc = round(l1_svm_poly['metrics']['test']['acc']*100, 2)
print('\nThe model accuracy with test data was: ', out_svm_acc,'%')
# get the support vectors information
support_vectors = l1_svm_poly['model'].n_support_
print('The number of support vector for class -1 are:', support_vectors[0])
print('\nThe number of support vector for class 1 are:', support_vectors[1])
It was very useful to generate the symbolic features, besides the autoregressive and hadamard features. We began having 4 time series (OHLC) to 300 explanatory variables, and all of them where scaled with a standarization process. This was crucial since both OLS and SVM are algorithms that are not scale invariant, so it is highly recommended to scale the data.
We had a better performance with a polynomial kernel, but very similar to the radial based function kernel, therefore, even that the explanatory variables are both linear and non linear representations of the phenomena (price movement), it was necessary a non linear kernel. That we can conclude since the linear kernel performed very poorly.
Perhaps the of the most important conclusion would be the following: We have validated that there is a series of considerations, steps and a general problem approach for the timeseries prediction problem. By looking to predict the sign of the difference of prices, over the quantity in the price exchange rate, we restated a regression problem as a classification problem and it was useful doing that. Also, that the feature engineering process, data scalation and the proposed models and hyperparameters were very good choices, since produced decent accuracy rates out of sample.
(Hastie, Tibshirani, & Friedman, 2009). The Elements of Statistical Learning, 2nd edition
(vapnik, 1992), The nature of statistical learning.
(Boyd,Vandenberghe, 2004). Convex Optimization. cambridge university press
Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay. 2011. Scikit-learn: Machine Learning in Python, Journal of Machine Learning Research, 12, 2825-2830, 2011.